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Abstract. A detailed study of energy differences between the highest occupied 
and lowest unoccupied molecular orbitals (HOMO-LUMO gaps) in protein 
systems and water clusters is presented. Recent work questioning the applicability 
of Kohn-Sham density-functional theory to proteins and large water clusters 
(E. Rudberg, J. Phys.: Condens. Mat. 2012, 24, 072202) has demonstrated 
vanishing HOMO-LUMO gaps for these systems, which is generally attributed 
to the treatment of exchange in the functional used. The present work shows 
that the vanishing gap is, in fact, an electrostatic artefact of the method used 
to prepare the system. Practical solutions for ensuring the gap is maintained 
when the system size is increased are demonstrated. This work has important 
implications for the use of large-scale density-functional theory in biomolecular 
systems, particularly in the simulation of photoemission, optical absorption and 
electronic transport, all of which depend critically on differences between energies 
of molecular orbitals. 



PACS numbers: 87.15.A-, 71. 15. Mb, 33.15.Kr, 36.40. Cg, 31.15.E-, 87.10.-e 
1. Introduction 

Density-functional theory (DFT) [ll [2] simulations are becoming increasingly 
widespread for simulating biological systems at the level of individual atoms and 
electrons. The advantages of DFT over, for example, molecular mechanics (MM) 
force fields include correct treatment of long-range polarisation, charge transfer, bond 
breaking and forming, electronic states and, by association, spectroscopy. In addition, 
transition metals, unusual ligands or functional groups that are difficult to correctly 
parameterise within force fields are also accurately treated. A limiting factor in 
applying conventional first-principles approaches to large systems is the unfavourable 
computational requirements which typically increase as the third (or greater) power 
of the number of atoms in the system. However, methods to overcome this bottleneck 
using linear-scaling approaches to DFT have been under development for over a 
decade [3] and are still advancing today [4], allowing system sizes on the order of 
tens of thousands of atoms to be routinely accessed [5]. 
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As larger systems become accessible, the application of density-functional 
methods to systems of biological interest is becoming increasingly common. The FHI- 
aims package [B] has been applied to folding processes in helices and polypeptides [7]. 
The TeraChem package [5] has recently been used to optimise the structures of more 
than 50 polypeptides of sizes ranging up to 590 atoms, predicting protein structure 
with the same accuracy as empirical force fields paramaterised extensively for this 
purpose [9]. Linear-scaling DFT calculations have been reported on a dry DNA 
model comprising 715 atoms using the SIESTA code [lU] and a B-DNA decamer 
using the CONQUEST code [TT] with explicit water molecules and counter ions ^n\ . 
The ONETEP code [IH], applied in the present investigation, has previously been 
used to perform geometry optimisations to characterise binding energetics of small 
molecules to the metalloprotein myoglobin |14j and to measure binding of small 
molecules to T4 lysozyme in solution |15| . Recent developments in the simulation 
of optical spectroscopy [16j , dynamical mean field theory with applications to human 
respiration [T^ and methods to aid interpretation of the electronic structure [T51 [TH] 
broaden the scope of biomolecular simulations still further. 

Despite considerable interest in the use of ab initio simulations for the study of 
complex biomolecular systems, there is a growing concern that DFT, in conjunction 
with pure exchange-correlation functionals (those defined as containing no Hartree- 
Fock exchange), may be inappropriate for the simulation of large molecular clusters. 
In particular, it has recently been shown that there are unphysical vanishing HOMO- 
LUMO gaps in systems such as proteins [20] and even water clusters [21]. This can 
lead to poor convergence during the self-consistent electronic structure optimisation 
procedure. Poor self-consistent field (SCF) convergence has been found for BLYP 
and B3LYP DFT functionals [Hj and for LDA and PBE functionals when simulating 
large glutamic acid-alanine helices [53]. Attempts to optimise polypeptide structures 
in vacuo using the TeraChem package with BLYP and B3LYP functionals reported a 
lack of SCF convergence for many of the peptides simulated [9] . Similar problems have 
been observed when using molecular fractionation with conjugated caps to compute 
ab initio binding energies in vacuum for protein-ligand complexes of between 1000 and 
3000 atoms [^ . 

SCF convergence issues are widely blamed upon the well-known phenomenon 
of pure functionals under-estimating the HOMO-LUMO gap [25l |26l 122 ■ However, 
while the lack of the derivative discontinuity of the exchange-correlation potential at 
integer particle numbers and errors in the single-particle eigenvalues resulting from 
the approximate nature of the functional itself will indeed reduce the gap, there is no 
obvious reason why the effect should worsen at larger system sizes [3S] . Indeed, it has 
been shown that recovery of a sizeable gap and consequently robust self-consistent 
convergence of the electronic energy levels is possible by including electrostatically 
embedded point charges to represent water molecules around an inner cluster treated 
with quantum mechanics (QM) [5D1 [55] or by simulating the system in a dielectric 
medium [23] . These results point to the possibility that the vanishing gap is a surface 
effect and not an inherent difficulty with pure Kohn-Sham DFT. 

In this communication, we use the linear-scaling DFT code ONETEP [13] to 
investigate the HOMO-LUMO gap of water clusters and protein systems. As in 
previous studies 20 we find that the gap often vanishes in vacuo. However, we 
provide conclusive evidence that the issue is not related to the use of a pure exchange- 
correlation functional but rather is a result of the approach used to prepare the system. 
We suggest a number of practical measures for preparing large systems that do not lead 
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to a vanishing HOMO-LUMO gap and thus open the way for continued investigations 
of biomolecular systems with Kohn-Shani DFT. 

2. Computational Method 

The present work uses ONETEP [13], a hnear-scahng DFT package designed for 
use on parallel computers [29l that combines near-complete basis set accuracy with 
a computational cost that scales linearly with the number of atoms. This allows 
an accurate QM description of systems of thousands of atoms [5], including entire 
proteins [TH [3D1 EI]- The ONETEP formalism is based on a reformulation of 
conventional Kohn-Sham DFT [1] [2] in terms of the single-particle density matrix 



where (/'^(r) are non-orthogonal generalised Wannier functions (NGWFs) [55] that 
are localized in real space. The density kernel K"^ is a representation of the density 
matrix in the biorthogonal duals of the NGWFs. ONETEP achieves linear scaling by 
exploiting the "nearsightedness" of the single-particle density matrix in non-metallic 
systems [33l [34] . In practice, linear scaling arises from enforcing strict localisation 
of the NGWFs onto atomic regions and through the optimisation of the density 
kernel and NGWFs, subject to localisation constraints. Optimising the NGWFs 
in situ allows for a minimal number of atom-centred orbitals to be used whilst 
maintaining plane-wave accuracy. The basis set underlying the NGWFs consists of 
periodic cardinal sine (psinc) functions [35]. The use of a plane- wave basis allows 
an unbiased approach to DFT calculations with systematically improvable accuracy 
by varying a single parameter similar to the energy cutoff in conventional plane-wave 
DFT packages. QM calculations have been performed in ONETEP using the PBE 
gradient corrected exchange-correlation functional [35]. ONETEP will converge to a 
minimised electronic energy irrespective of the filling of the states that is applied at 
the beginning of a simulation. No density kernel truncation has been performed in 
the current calculations. The kernel is optimised using a combination of methods |37) . 
which has the effect of ensuring integer occupancies of zero or one for all single-electron 
eigenstates of the Hamiltonian, which is appropriate in bulk insulators with a clearly 
defined band gap or, as is the case in the systems in the present work, in isolated 
systems with a clear HOMO-LUMO gap. In cases where the systems have a zero 
gap, traditional linear-scaling algorithms for the optimisation of the density kernel are 
not suitable and often converge to unphysical solutions not corresponding to integer 
occupation of the valence eigenstates. In such cases we have indicated that the system 
did not converge. 

Where indicated in the text the use of implicit solvent has been made. The 
implicit solvent model is fully self-consistent and based on direct solution of the 
inhomogeneous Poisson equation. The approach involves defining the solute cavity 
in terms of an isosurface of the electron density [35] and has been implemented in 
ONETEP [TS] [35]. In addition, the use of electrostatic point charges (denoted by 
QM/EE in the text) has been made. Electrostatic embedding can significantly reduce 
the computational cost associated with density-functional calculations by representing 
a selected portion of the total system in terms of highly localised classical charge 
distributions that are electrostatically coupled with the quantum system, representing 




(1) 
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the effect of the environment in which the quantum system is embedded [40] . ONETEP 
parameters are described in detail in the Supplementary Methods. 

3. Water Clusters 

The correct treatment of water is vital for realistic simulations of biomolecular 
environments. However, recent large-scale DFT simulations with pure functionals 
have encountered SCF convergence problems when simulating isolated water clusters, 
due to the HOMO-LUMO gap decreasing to zero when the cluster radius becomes 
larger than approximately 10 A [20 . We begin by using the linear-scaling DFT code 
ONETEP to calculate the band gap of a 2010-atom periodic supercell of bulk water, 
equilibrated using the classical molecular dynamics package AMBER f4r (Supporting 
Methods). Consistent with previous DFT calculations of water employing the PBE 
functional and as expected for a bulk insulator, the system has a clearly defined band 
gap of 4.2 eV. We have also calculated the HOMO-LUMO gap of spherical water 
clusters of increasing radius extracted from a larger 50 A cube of water equilibrated 
at 300 K with classical molecular dynamics (MD). For small clusters such as these, 
quantum confinement would generally be expected to result in a HOMO-LUMO gap 
greater than that of the bulk, although one might expect the value of the cluster gap 
to tend to the bulk band gap with increasing system size. However, in agreement with 
previous work [20], Figure [IJa) (black line) shows that the HOMO-LUMO gap quickly 
approaches zero for systems of more than approximately 200 atoms. This observation 
has therefore led many to question the applicability of pure DFT functionals to large 
systems, such as water clusters and proteins [S] [201 1211 HH 112 ■ 

However, our results showing the insulating nature of very large supercells of bulk 
water indicate that the system size in itself is not a problem but, rather, the small 
HOMO-LUMO gap is caused by the vacuum-water interface. Bulk water consists 
of a continuous hydrogen-bonded network of water molecules, each of which has an 
intrinsic dipole moment. The process of extracting a cluster, freezing the atomic 
positions and surrounding with vacuum potentially results in a large surface dipole 
being created. A rough estimate illustrates this: the dipole moment of a single isolated 
water molecule (0.73 e.ao) produces a potential difference of 0.4 V between opposing 
points on a sphere of radius 5 A with the molecule at its centre. While the molecular 
dipoles inside the cluster are oriented so as to mostly cancel any long-ranged effect, 
those on the surface are not compensated by their neighbours, so a large cluster will 
retain a large net dipole moment. Indeed, Figure [ijb) (black line), which shows the 
average dipole moment of clusters of water molecules calculated using a TIP3P point 
charge model for water [43] , reveals that the dipole moment increases with system size 
as the created surfaces become larger in area. A similar trend in dipole moment is 
observed for QM calculations of single snapshots of water clusters of increasing radius 
(Figure SI) and Figure [2j^ a) plots the DFT electrostatic potential on a plane behind 
the 16 A water cluster, revealing a clear dipolar potential. 

We have demonstrated that water clusters extracted from the equilibrated bulk 
display large multipole moments, as measured both by MM and QM. What effect 
does this have on the computed HOMO-LUMO gap? Figure [2ja) also shows the local 
density of electronic energy states (LDoS) for a 16 A water cluster. To determine 
this quantity, clusters are nominally divided into 10 slabs perpendicular to the dipole 
moment, defined as the ^-direction, and the slab LDoS is defined as the sum of the 
contributions to the total DoS from the local orbitals centred on the atoms in each 
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Figure 1. (a) DFT HOMO-LUMO gaps of water clusters of increasing radius 
extracted from a larger 50 A cube of water equilibrated at 300 K using classical 
MD. Black line: Extracted straight from bulk water. Blue line: After classical 
minimisation is performed on each extracted cluster. Red line: Simulating 
the extracted clusters in implicit solvent. Dashed line: HOMO-LUMO gap of 
bulk water, (b) Average dipole moments of water clusters of increasing radius 
calculated using the TIP3P point charge model. Black line: averaged over 5000 
snapshots extracted from a larger 50 A cube of water. Blue line: averaged over 
1800 snapshots extracted from the bulk and minimised using an MM force field. 

slab. We observe a clear shift in the LDoS as a function of the position along the 
dipole moment vector, with the electric field pushing some states higher in energy and 
some lower. Analogous to the concept of Fermi-level pinning in polar semiconductor 
nanorods, where the Fermi energy coincides with a finite density of states at either end 
of the rods (44) I45| , the Fermi energy coincides with a non-zero density of states on 
opposite surfaces of the water cluster. We expect the HOMO-LUMO gap to disappear 
completely when the radius of the water cluster increases to the point where the 
variation of the surface potential is of sufficient magnitude to bridge the gap. 

Given the apparent electrostatic origin of the vanishing gap, we hypothesise that 
there is no fundamental problem in the use of pure functionals in simulations of large 
systems, but simply that the issue manifests itself at smaller system sizes than it would 
for hybrid functionals which have an inherently larger gap. We would, therefore, 
expect any method that corrects these surface effects to also restore the HOMO- 



Electrostatic considerations affecting the calculated HOMO-L UMO gaps 



6 



LUMO gap, which is consistent with observations made in the hterature. It has 
been shown previously that the HOMO-LUMO gap may be restored by embedding 
classical point charges outside the electron distribution to represent, for example, the 
aqueous environment of the water or protein cluster |20j . In these cases, no significant 
changes occurred to the electronic density of the inner water molecules and the only 
significant changes in electronic density were observed on water molecules close to the 
surface |28j . Further, findings that a dielectric medium with a relative permittivity 
of 4 leads to robust SCF convergence of proteins in vacuum [21] imply that screening 
of the surface dipole is sufficient to restore the HOMO-LUMO gap. In the following, 
we test a number of methods for setting up a QM cluster in such a way that DFT 
calculations employing either pure or hybrid functionals may be readily applied. 

The electric field across the cluster will be reduced if the atom positions 
are allowed to relax, either by geometry optimisation or an annealing procedure. 
Figure[ljb) (blue line) reveals that, following classical minimisation, via fast conjugate 
gradient and Newton- Raphson optimisation (Supplementary Methods), the average 
dipole moment of the extracted water clusters is substantially reduced (as measured 
by classical point charges). Furthermore, the HOMO-LUMO gaps of clusters that have 
undergone MM minimisation are all restored to values close to the bulk water value of 
4.2 eV (Figure [TJa) (blue hue)). In addition, an implicit solvent model is expected to 
reduce the shift in electronic states on opposite surfaces by screening the electrostatic 
potential across the cluster. Figure [ija) (red line) shows that when the extracted 
water cluster is simulated with implicit solvent in ONETEP the HOMO-LUMO gap 
is again restored to the bulk value. Figures [2jb,c) confirm that the dipole moment of 
the 16 A water cluster is reduced following MM minimisation and is negligible in the 
dielectric medium. In both cases, the density of electronic states is less dependent on 
the z-coordinate and closely resembles the bulk DoS (green line), as expected for a 
large cluster. 

4. Protein Systems 

Turning our attention to polypeptide systems, six protein conformations (IPLW |46| . 
IFUL [m, IRVS [H], lEDW 02, IFDF [50] and lUBQ [51]) from the 
Brookhaven National Laboratory Protein Data Bank (PDB) f52| were used as starting 
configurations for our calculations. Table [l] shows the computed HOMO-LUMO gaps 
for these structures in vacuo, using the PBE functional. In agreement with Ref. [2U| 
and as expected for this method of system preparation, the calculations did not 
produce HOMO-LUMO gaps (and hence did not converge) for any of the proteins 
apart from the smallest system considered. Figure [3] reveals that the problem is 
similar in nature to that of the water cluster. Plotting the electrostatic potential far 
from the ubiquitin (lUBQ) protein reveals a strong dipole moment. The LDoS reveals 
the dipole moment has a significant effect on the valence states, shifting them to where 
the HOMO-LUMO gap should lie. 

In order to recover the expected HOMO-LUMO gaps, we use similar techniques 
to those described in the previous section (Supplementary Methods). Classical 
minimisation was performed on the IFDF structure in vacuo, but again the electronic 
structure calculation failed to converge. Classical optimisation, though able to restore 
the HOMO-LUMO gap in water clusters, is unsuitable for protein systems. This 
may be understood from considering the reduced opportunity for mobility in proteins, 
compared to water, as residues are fixed in secondary structure conformations. In 
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Figure 2. Electrostatic potential for a 16 A radius water cluster and local density 
of electronic states for groups of atoms as a function of position along the dipole 
moment vector of the cluster. The dipole moment vector (coloured arrow) runs 
from the red line to blue. The black line is the total density of states and the 
green dashed line is the total density of states for bulk water. Each line in the 
LDoS plot is normalised by the number of molecules contained in the slab. The 
electrostatic potential ranges from -0.3 V (red) to -|-0.3 V (blue). The slice is 
24.6 A behind the water cluster, (a) Snapshot extracted from bulk water. The 
dipole moment is high, the LDoS is strongly dependent on position relative to 
the dipole moment vector, and the total range of states is much wider than for 
bulk water, (b) After classical minimisation of the same snapshot, (c) Simulated 
using the ONETEP implicit solvent model. In both cases, the dipole moment is 
reduced and the DoS closely resembles that of the bulk. 



addition, proteins may contain residues that are charged at physiological pH, yet 
when these are simulated in vacuo the charges are unscreened, and thus have a 
stronger and longer-ranged effect on the electrostatics than they do when solvated. 
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Such residues will contribute significantly to the molecular dipole moment and may 
cause an undesired shift in the energies of the surface electronic states. We hypothesise 
that a more suitable strategy is to include the effects of the environment through the 
use of explicit water molecules or implicit solvent, in order to screen the effect of 
charged residues. 

To this end, each protein structure was solvated by a classically minimised 
5 A layer of water and simulated in ONETEP using full DPT for the entire system (up 
to 2386 atoms) to re-calculate the electronic structure. In addition, implicit solvent 
calculations have been used for the protein structures in their vacuum configurations. 
Using either an explicit or implicit solvation strategy restores the HOMO-LUMO 
gaps to similar values. Figure [sj^b) shows that the DoS for the implicitly solvated 
system more closely resembles that of an insulator. The associated variation in the 
electrostatic potential is negligible, as in the case of the water cluster from Pigure[2j^c) 
above, and is not shown. To explore the possibility of reducing the computational cost 
of the calculation, we have also represented the explicit water layer by embedded point 
charges with a TIP3P charge distibution. In this case, the HOMO-LUMO gap is very 
similar to that of the full QM water layer, implying that classical charges produce an 
electrostatic environment appropriate to reduce the net dipole of the system. 



HOMO-LUMO Gap / eV 



PDB ID 


Atoms 


Charge 


in vacuo 


QM water 


Implicit Solvent 


QM/EE 


IPLW 


75(456) 





0.0 


3.7 


3.7 


3.5 


IPUL 


135(453) 


-1 


n/a 


2.7 


2.6 


2.6 


IRVS 


172(670) 





n/a 


3.4 


3.7 


2.9 


lEDW 


399(978) 


-1 


n/a 


3.1 


3.7 


2.6 


IPDP 


419(1526) 


3 


n/a 


1.9 


3.3 


1.6 


lUBQ 


1231(2386) 





n/a 


2.6 


3.4 


2.4 



Table 1. HOMO-LUMO gaps for a range of proteins from the PDB. Atom 
number in parentheses includes a 5 A solvation shell of water used in classical 
minimisation and QM/EE simulations. Systems that did not converge are 
indicated by n/a. Vacuum calculations and implicit solvent simulations did not 
include any explicit water molecules. 



5. Conclusions 

We have confirmed recent findings ^20] that DPT electronic structure optimisation 
is hindered by vanishing HOMO-LUMO gaps in large water and protein clusters 
- systems that should display insulating behaviour. This issue manifests itself in 
clusters prepared with improper treatment of the vacuum/system interface. In the 
examples presented here, unequilibrated vacuum/water interfaces and X-ray protein 
crystal structures exhibit strong dipole moments. Our LDoS calculations, decomposed 
by slabs along the direction of the dipole moment, reveal that electronic states are 
shifted by the electric field across the cluster. The Permi energy is, hence, pinned 
by states on opposite surfaces, which leads to closure of the HOMO-LUMO gap. 
The presence of a 4.2 eV band gap in a 2010-atom model of bulk water calculated 
with the PBE exchange-correlation functional, emphasises that this effect has an 
electrostatic origin, rather than being peculiar to any particular class of functionals. 
Hybrid functionals that tend to have an intrinsically wider gap appear to be able to 
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Figure 3. Electrostatic potential and LDoS for groups of atoms as a function of 
position along the dipole moment vector (coloured arrow) of ubiquitin. The black 
line is the total density of states. Each line in the LDoS plot is normalised by 
the number of molecules contained in the slab, (a) The experimental structure 
with no solvation. The electrostatic potential ranges from -0.2 V (red) to -|-0.2 V 
(blue). The slice is 42.9 A behind the water cluster, (b) LDoS along the dipole 
moment of the same structure simulated with implicit solvent. 

converge clusters comprising thousands of atoms |S1 [Mj , but we would expect HOMO- 
LUMO gap closure, even for functionals containing Hartree-Fock exchange, once DFT 
methodological advances allow access to still larger systems. 

In this communication, we have demonstrated practical solutions for restoring 
the HOMO-LUMO gap, with methods ranging from classical structural optimisation 
of water/vacuum interfaces, to screening of molecular dipole moments via implicit 
solvation of protein structures. In general, implicit solvation gives the best 
correspondence between HOMO-LUMO gaps of large water clusters and the bulk band 
gap, and restores larger HOMO-LUMO gaps for proteins than does explicit solvation. 
In the present work we have looked at systems comprising up to 2386 atoms, and 
the practical solutions demonstrated here will allow the continued investigation of 
biomolecular systems through the use of Kohn-Sham DFT. 
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Supplementary Methods 

AMBER parameters 

Water molecules were generated using the tleap module of AJVIBER with associated 
TIP3P charges. Coulomb interactions were treated using the Particle JVIesh Ewald 
sum, with a real space cutoff of 10 A. The cut-off length for Lennard- Jones interactions 
was also set to 10 A. For the 4763 atom water periodic cube, the system was minimised 
in the NVT ensemble before being heated to BOOK in six stages in the NPT ensemble. 
A production run of 5 ns at 300 K was then performed and snapshots were saved every 
1 ps. The 2010 atom structure for the bulk water HOIvIO-LUlVIO gap calculation in 
ONETEP underwent identical preparation. 

With regards to the protein systems, in the event that starting configurations 
obtained from the PDB had more than one conformation available, the structure 
labelled as 'model 1' was used. All protein interactions were described using the 
AJVIBER ff99SB biomolecular force field '53^. The protein systems were solvated 
in a 50 A water cube with a TIP3P charge distribution. NVT minimisation was 
performed before NPT equilibrating to 300K in six steps. A 5 ns NVT production 
run was then performed to generate the final structures. Throughout minimisation, 
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equilibration and the production run, harmonic constraints of 100 kcal mol^^ A^^ 
were imposed upon the protein. The water molecules were then stripped from the 
structure and these protein configurations were used as our vacuum conformations. 
In order to recover the expected HOMO-LUMO gaps, we use similar techniques to 
those described in the main text. Namely, 5 A of the surrounding water molecules 
were retained from the molecular mechanics equilibration simulations for each protein 
and the solvent geometry was optimised via fast conjugate gradient (CG) followed by 
Newton-Raphson (NR) minimisation until the root mean square force decreased below 
lO"'^ kcal mol~^ during CG minimisation and below 10~^° kcal mol~^ for 
NR minimisation, whilst the protein residues remained fixed. 

ONETEP parameters 

Interactions between electrons and nuclei were described by norm-conserving 
pseudopotentials. NGWFs were initialized as orbitals obtained from solving the Kohn- 
Sham equation for free atoms, with a Is configuration for H, a 2s2p configuration 
for C, N, O and a 3s3p configuration for S [S3]. The NGWFs were expanded in a 
basis of periodic cardinal sine (psinc) functions [35] with an energy cut-off of 916 eV, 
corresponding to a grid spacing of 0.475 ao, and were localised in real space with 
radii of 5.3 A. In the case of the IPLW protein, an increase in NGWF radii from 
5.3 A to 6.4 A led to the change of the HOMO-LUMO gap by 3 meV. These radii at 
a cut-off of 1020 eV give a change in HOMO-LUMO gap of 6 meV when compared 
to a 916 eV cut-off energy calculation. NGWFs are optimised in situ to represent 
the valence states: however, previous experience shows these describe the conduction 
states well for at least the first 1-2 eV above the LUMO and thus produce the same 
gap as equivalent plane- wave calculations. At energies beyond this point, however, the 
DoS is not well-represented by NGWFs ^16; and should be discounted. The spherical 
cut-off approach for Coulomb potentials was used to eliminate all interactions of the 
molecules with their periodic images [55| . 

Where indicated in the text, implicit solvent has been used. Using a smeared- 
ion formalism, the molecular Hartree energy is obtained not in reciprocal space, like 
standard ONETEP, but rather by solving the Poisson equation (homogeneous in 
vacuum, inhomogeneous in solution) in real space, under open boundary conditions. 
This is achieved via a multigrid approach detailed elsewhere [39l[T5]. A 10 oq gap is 
left between the electron density and the edge of the multigrid and the ion smearing 
width is 0.8 ap. The converged electronic density obtained in vacuum was used 
to generate the density-dependent dielectric cavity in solution. The values of the 
solvation parameter /3 and electronic density threshold po were 1.3 and 3.5x10"^ a.u. 
respectively, as proposed in |56j . The relative dielectric permittivity of the solvent 
was set to 80.0 for all implicit solvent calculations. 
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Water Cluster Radius / A 

Figure 4. Calculated total dipole moment of water clusters of increasing radius. 
Black Line: ONETEP calculated QM dipole moment from the final molecular 
dynamics snapshot at each radius. The dipole moment increases with radius in 
the same manner as in our classical simulations. Blue line: ONETEP calculated 
dipole on the same snapshot after classical minimisation, which reduces the dipole 
moment across the cluster. Red line: ONETEP implicit solvent calculations. In 
this case, the dielectric medium supports a higher dipole moment, although the 
net potential is screened at large distances. 



